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1. INTRODUCTION 

A wheeled mobile robot (WMR) is a system that can move from one location to another 
autonomously, without the need for external intervention or assistance [1]. Unlike robotic arms, which can 
only work in a specific space, WMR can move freely within a predetermined workspace to achieve the 
desired goal. Proportional integral derivative (PID) controllers [2], feedback linearization controllers [3], 
backstepping controllers [4], sliding mode controllers [5], [6], adaptive controllers [7], [8], robust controllers 
[9], [10], fuzzy controllers [11], [12], and neural network-based controllers [13], [14] are just a few of the 
control methods proposed for WMR. The WMR is assumed to roll without slipping in these studies. In 
practice, however, due to the presence of nonlinear components such as friction, wheel slip, and so on, some 
studies have added these components to the mathematical model of the WMR to improve accuracy [15]-[18]. 
In [15], [16], the friction and wheel slip components are included in the robot's kinematics and dynamics 
models, and then robust controllers for tracking control are established. Vu et al. [17] presents an adaptive 
control method based on a disturbance estimator that can compensate for the effects of wheel slip and 
external disturbances acting on both kinematic and dynamic loops. Similar control structures for uncertain 
WMR with kinematic and dynamic control loops are presented in [18]. However, rather than using two 
disturbance observers in the inner and outer loops, which complicates the system, the controllers in [18] are 
designed to deal with disturbances using the adaptive fuzzy type 2 control technique. Because the controller 
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parameters are updated based on the optimal rule to adapt to changes in working conditions, the disturbance 
observers are removed. 

The previously mentioned control techniques have fulfilled the criteria for achieving high-quality 
trajectory tracking. Nevertheless, the issue of determining the optimal index with respect to tracking error and 
control energy remains unresolved. Reinforcement learning (RL) [19] and adaptive dynamic programming (ADP) 
[20] are efficient techniques that leverage optimization rules of dynamic programming to address optimization 
problems. RL is utilized for the purpose of determining the resolution of the Hamilton-Jacobi-Bellman (HJB) and 
Hamilton-Jacobi-Isaacs (HJI) equations. This is due to the fact that these equations are nonlinear differential 
equations, rendering it challenging to obtain solutions through analytical approaches, particularly in the case of 
nonlinear systems. Prior research has employed the conventional ADP control framework [21], [22], featuring two 
neural networks referred to as actor-critic (AC), frequently neglecting the impact of disturbances on the system. 
The proposed approach employs a neural network, specifically an actor network, to approximate the optimal 
control law. Additionally, a critic network is utilized to evaluate the control law and approximate the optimal cost 
function. Subsequent to this achievement, a number of algorithms have been developed for nonlinear systems that 
are subject to disturbance effects [23]-[28]. The algorithms in [23]-[25] employ the ADP structure, which 
incorporates three neural networks. Notably, an additional neural network has been incorporated into the AC 
structure to estimate the upper bound of noise. A reinforcement learning based trajectory tracking controller is 
proposed for the autopilot system of underactuated surface vessel (USVs) influenced by input disturbances and 
input signal constraints [26]. By using the tracking error conversion technique to handle the error constraint 
problem, it is ensured that the USVs can accurately follow the set trajectory. However, the updating rule of the 
weights of the actor and critic neural networks is sequential, which reduces the convergence speed of the 
parameters. Sun and Liu [27] propose robust optimal control for the rocket autopilot using ADP technique 
combined with nonlinear disturbance observer and adaptive sliding mode controller. The AC architecture is used to 
design an adaptive optimal controller using only one critic neural network. However, because the recognition 
process is additionally combined, the controller has a high computational complexity and is difficult to implement. 
An algorithm based on online adaptive reinforcement learning method is developed for the optimal control 
problem of continuous nonlinear systems with model uncertainty [28]. To approximate the solution of the HJB 
equation, an actor-critic-identity (ACI) structure is used based on three neural networks: actor and critic networks 
that estimate the optimal control law and the optimal cost function, respectively, and the third network is used for 
system dynamics identification. The utilization of a control structure that involves two or three neural networks 
may ensure the good performances for uncertain nonlinear system but can result in a complex calculation process 
and inefficient use of resources, ultimately resulting in a reduction in the rate of convergence. 

This paper develops an adaptive optimal controller for tracking control of a WMR system using the 
online adaptive dynamic programming technique in cooperation with a disturbance observer. The control 
scheme consists of two parts: the first part is the optimal component to optimize the cost function and the 
second component is the compensation component that uses the estimated disturbances to remove the effect 
of model uncertainty and the system noise. The optimal controller is designed based on the value iteration 
(VI) method and simultaneously updates the weight matrix. The stability of the whole system using the 
optimal component and the disturbance observer is demonstrated under the uniformly ultimately bounded 
(UUB) condition. Finally, some simulations are performed to prove the correctness of the algorithm. The 
simulation results show that the proposed scheme gives good performance for both the nominal working and 
when affected by uncertainties and external disturbances. 


2. SYSTEM MODELLING 
Considering the three-wheel mobile robot, two independent drive wheels at the rear and one rudder 

at the front, subject to nonholonomic constraints as shown in Figure 1. In Figure 1, G is the WMR's center of 
mass, M (Xm, Ym) is the center of the axle connecting the two rear wheels, and @ is the WMR's direction 
angle. Let r and @, represent the angular velocities of the right and left wheels, respectively. up and 
H represent longitudinal slip of the right and left wheels, while ô represents the the lateral slip along the 
wheel shaft. Taking the effect of wheel slip into account, the kinematic equation for WMR is [17], [18]: 

xy = ß cos 0 — Š sin 0 0) 

yu = B sin 0 + cos 0 
where £ is the linear velocity perpendicular to the axis joining the two rear wheels and w is angular velocity 


of the WMR. 
The dynamic model of the WMR is as (2) [17], [18]: 
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where Tg is input disturbance, v = [r @ ,]’,u = [Hr LJ" 


Caster wheel 


Left wheel 
é | Ai 


~ Right wheel 


Figure 1. WMR model and coordinate system 


The objective of this study is to control the WMR, i.e. point M(xy, Ym), to track the reference 
trajectory, represented by point T(x;, yr) with mininize consumption of energy. The position error between 
the center of robot (point M) and the target point (point T) is determined as (3): 


_ [p1] _[ cos@ sin@)[*r — Xm] _ ,,[*r Xm 
êp = lel ~ [i sin@ cos@ b j E Lr 25, | 6) 
where H is the transform matrix: H = | cos O sing 
—sin@ cos@ 
The time derivative of (3) is obtainted as (4): 
i | 
ė =|. | =Kvt 4 
p f a (4) 


where x and €, are calculated as in the documents [17], [18] 
Define state variables x; = €p; X2 = X, + Ax, where A is a positive scalar. Based on (2) and (4), the 
time derivative of x; and x2 has the following form: 


žı = p = KV + ğı (5) 
ž = Ex, — Ex, +Zt +d (6) 


where: E = E k71, d = č — Ek™të, E1 = -KM1B,Z = kM}, 
č, = -MT (Qji + C5 + GÖ + Ta) = kë, + Kv + +Akv + dg, 


Rewriting (5) and (6) in the state space form, the following is obtained: 
x= f(x) + Juu + gad (7) 
Xz — Ax, 


where: f(x) = lee e Iu = p: Ia = BE T=u 


Because system (7) is nonlinear and affected by disturbance d, the controller u is established as follows to 
achieve optimal performances: 


u = u(x) + Ua (Xx) (8) 


where u,(x) is the optimal control component which will be designed using the ADP method and ug(x) 
compensation control component. 
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3. ROBUST ADAPTIVE OPTIMAL CONTROLLER DESIGN FOR THE WMR 

The ADP algorithm can only be applied to nonlinear systems when ignoring impact noise, which 
reduces the applicability of the algorithm in practice. Therefore, we propose to combine the ADP algorithm 
with a disturbance observer to design a sustainable optimal tracking controller for WMR containing an 
uncertain component, affected by disturbances. Figure 2 illustrates the proposed controller's structure 
diagram. The controller is constructed of two parts: an optimal component u,(x) and a disturbance 
compensation component uq(x). The detailed design for each part is shown. 


Disturbance 
compensation 
controller 


Figure 2. ADP control structure combined with disturbance observer 


3.1. The adaptive optimal controller 
In the case of d = 0, the nonlinear system (7) is rewritten as (9): 


X= f(x) + Gu(x)ur (9) 


Assumption 1: f (x) satisfies the lipschitz continuous condition in the set 2 where 2 is a set which consists of 
all possible solution of (9). 
Define the cost function: 


I(x) = fF r@@,u,@@)) dt (10) 
where r(x(t),u,(t)) = x7 (t)Qx(t) + ul (x)Ru,(x) in which Q € R?"*2", ReER"*" are symmetric 
positive definite matrices. 


The Hamilton function is defined as (11): 


ayy! . ayy! . 
H(x, ur Jx) = (=) x+r(x,u,) = (=) x+x7Qx + ul Ru, (11) 


For the system (9) to have the optimal solution, there must exist a function V(x, u,.) satisfying the HJB: 


avy! . 
Hx u, V) = (Z) x +x7Qx +utRu, = 0 (12) 
The optimal control signal u, is then determined by using the formula that is presented: 


u, = argmin{H (x, u,,V)} (13) 
ur 
By solving (13) using (12), the following is obtained: 


-1R gZ 14) 
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However, the system (11) is nonlinear, so (12) cannot be solved directly, which means the function 
V(x,u,) cannot be found by analytical methods. To overcome this difficulty, the function V(x, u,) is 
approximated by a neural network as (15): 


V(x, u) = WT@(x) + E(x) (15) 


where W is the weight matrix of the neural network, (x) is the active function which is a function of state 
variable x, and €(x) is the approximation error. 
Using this approximation, the controller u% becomes: 

av 1 
-1T WV _ -1,T 
SR Juz TTE uW 


2 U ax 


ron 


Jz (16) 


Unfortunately, the real value of W in (15) is unknown; therefore, it is replaced by an estimation, and the 
function V(x, u,) is also presented by its estimation, as (17): 


V(x, u) = WT®(x) (17) 
where Wis the estimation of W, which is updated by the following law: 


a ATY ð 5 
areas (67W + Q(x) + ulRu,) + =, C2 gu(x)R "gu (x)x (18) 


1 @Te+ 


W=-a 


in which ô = 5 
Finally, the optimal controller u, is employed as (19): 


TOL 


1 =, ~ 
u, = -iR gwT = (19) 
3.2. Observer based adaptive controller design 
Consider system (7) which is affected by the disturbance d: 
x= f(x) + gu (x)u + Ja (x)d (20) 


As mentioned in section 2, the controller u consists of a compensation control component that compensates for the 
effects of system uncertainties and disturbances. In this study, this control component is utilized as shown (21): 


ug(x) = —Z-1d (21) 


where d is the estimation of d, the value of d is determined by the following observer [29]-[31]: 


l d =n + p(x) (22) 
n= —h(x){ga ln + p(x)] + f) + gu@us 
in which 7 € R! is the internal state of the observer, p(x) € R! is a designed vector, and h(x) = 22o is the 


gain of the observer. The convergence of the observer is presented in detail in [29]-[31]. 


3.3. Stability of overall system 
Replace the control components (8) and (21) into system (20), the dynamic of the closed loop 
system is express as (23): 


x= f(x) + Juur — GuZ*d + gald +d) (23) 


Due to ga = Z1 gu, the following is obtained: 


x= f(x) + Juur + gad (24) 


In order to demonstrate the stability of the system which consists of the adaptive optimal controller, the 
disturbance observer, and the WMR, the following Lyapunov function is chosen: 
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where W =W-—W,X = [x Uur d W] and v; (i=1,2,3,4) is positive scalar. 
Due to u, is the solution of (10), the component SE (xTQx + uï T Rut) dt is limited. In addition, the observer 


in section 3.2 is exponentially convergent, so the integrator (es P (d'd) dt is convergent. These leads to the 
following result: 


0 < L(X,t) < mliXIl (26) 


where n4 is a positive constant. 
The time derivative of L(X,f) is determined as (27): 


L = 2v,x"(f + guu, + Z-1g,d) — v2(x™Qx + ul Ru,) — v||al|” — 


WW = 2v,x" (f + guur) + 21,x7Z-1gyd — v: (x7 Qx + ulRu,) — vsl|dl|” - mee a 
We have: 

2.x" V(x) gud < vyllall? + vi llZ gull? |||” a 

x? Qx +u Rut > ahe e l nina k 
or 

—v,(xTQx + ut Rut) < —vaAllxl?, h7 minmin i 


(27) is equivalent to: 
: = 12 
L < 2v ll MNF ODM + 2v1 le Mga llul + va llel? + vy [IZ tgal? -— 


vAllx|l?, llu eap = (1) 


minmin 


Using updating law (18) leads to (32): 


Va ata — _ Ma Ty 2v4 WT TY Ty — aT _ 
a WS WS =W W, = 2v, = W" CEE z(6TW W + Q(x) + ul Ru) 
~r OD = 
mT = g,,(x)R- gh (x)x (32) 
Define £y = G7W + Q(x) + ul Ru, 
Then: 
a Mery = oy, BAW oy ee Wega) R A (33) 
az ~ C4 a (eTe41) taz (oTo) E 4 xu 
24 TW < —2v,% F a ||_en_||? 
a2 W Ws 2v — az IleT enl Iw “+1, oTG+ zeal Iw “+ 14S aT rl 
-v WTP, gu RTE = o E S h IPE + El Mh - nogu gh (x)x (34) 


Also, because f(x) is Lipschitz then f(x) < k||x||where k is a positive constant. Substituting 


u, = —2R-1gh Cy W and (34) into (31), the following is obtained: 


L < ((2k + 1)vı — v2 Amin Ollx|l25||ue ll’ (v ||Z~ “till? ~~ v3)||d||° 


= ear + lp |S 


o 
seal i+ 
min A aTG+ 


oul? IR ND (35) 


a2 


ay 


V4 


Fen Ox 


Choose v, = v4 and note that W + W = W, then: 
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os DY. 98. 

L < (2k + 1)v; — VeAmin Ollx ll? aller I? NZ Gull? = PDE pig) — % = Sal I + 
v || — vy ee ioco- (36) 

L <((2k + 1)v; - Varin OMe a PAE pi) + (v,||Z-*gull? — v3)||dl|° - 

al- Way? 4», @ A 37 

Va oy Teal WI + vac 1 mall va ee) (37) 
where A= ||Wrrar |GE) ome 

L S((2k + 1)¥y — V2Amin Olle? ui |" NZ gal = volal pin) — Pe =| | + 
Va a || F ae Fo IIxII*) (38) 
Define Ly, = (2k + =) Vi — V2Amini Liu = —V2Amin3 Lia = vi llguyll? — v3 

ai 2 2 72 

Liw = =a Teall ; Lie = Zia wo = 
As a result, (38) is rewritten as (39): 

L < Lyyllxll? + Laylfuell’ + hallå + Law|)? + Le (39) 


If vi, i=l, ..., 4 satisfies: v, > e ia v3 > villguyll?, v4 = vı, and |lx|| > Sa Me or \|ui || = = or 


lal] = Ae or w 


Then 


IE 
-Liw 


L < ylix? (40) 


where y < max{L1,, Liw Lia Liy }is a negative number. 


4. SIMULATIONS AND DISCUSSION 

To verify the correctness of the optimal tracking control algorithm based on ADP algorithm with 
Actor-Critic structure, we perform numerical simulation on MATLAB/Simulink software with the 
parameters of the WMR given in Table 1 and the designed parameters as follows: 


a, = 0.25, æ, = 0.01, y = diag ([1000,1000,1000,1,1,1,1,1]) 


Table 1. Wheel mobile robot parameters 


Parameters Value 
Weight of the platform (mg) 30 kg 
Weight of each wheel (mw) 1 kg 
Inertial moment of the platform (Ic) 15.625 kgm? 
Inertial moment of each wheel (rotation axis - Iw) 0.1 kgm? 
Inertial moment of each wheel (diameter axis - Ip) 0.0025 kgm? 
Distance between the M and G (a) 0.3 m 
Radius of the wheel shaft (b) 0.75 m 
Radius of the wheel (r) 0.15 m 


The simulation is executed under two kinds of trajectory: straight line and circle line. In each case, 
the trajectory in xy-reference frame, the position error in time domain, and the velocity error in time domain 
are illustrated. The simulation results are depicted in Figures 3, 4(a), 4(b), 5, 6(a), and 6(b). 
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Figure 3. Response of the ADP controller with linear reference trajectory 
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Figure 4. Tracking error: (a) position tracking error and (b) velocity errors with straight-line reference 
trajectory 
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Figure 5. Response of the ADP controller with circle reference trajectory 
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Figure 6. Tracking error: (a) position tracking error and (b) velocity errors with circle reference trajectory 


From the simulation results, it can be seen that in the first stage, the critic neural network is in the 
learning process, so the quality of the tracking is not good. However, after a period of 8 s, the optimal control 
law finishes the learning process and convergese to the optimal value. This leads to an increased quality of 
the WMR's tracking and the WMR follows the reference trajectory. The tracking in the x and y axis and the 
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direction angle 0 gives a large error at initial state; however, after the learning period, the tracking error is 
almost zero for all variables in both case of simulations. 


5. CONCLUSION 

The present study introduces a novel approach that combines online adaptive dynamic programming 
with a disturbance observer to address the challenge of robust optimization in the context of nonlinear 
systems. The proposed approach, featuring a singular neural network, yields superior results in terms of 
enhanced system quality and reduced computational overhead. The mathematical proof of the stability of the 
entire system, comprising the optimal controller and disturbance observer components, is established via 
Lyapunov theory. Ultimately, a simulation was conducted to assess the efficacy of the algorithm that was put 
forth. Results of the simulation indicate that the observer-based optimal adaptive dynamic programming 
methodology possesses the capability to yield a favorable response for the wheel mobile robot, even when 
confronted with instances of system uncertainties and external disturbances. 

However, the above method still needs to know the internal dynamic information of the system to be 
able to update the controller parameters. In the next research direction, we use data about the state of the 
system to calculate a control algorithm that does not depend on the system's dynamic model. 
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